Audio-visual speaker identification using the CUAVE database

نویسندگان

  • David Dean
  • Patrick Lucey
  • Sridha Sridharan
چکیده

The freely available nature of the CUAVE database allows it to provide a valuable platform to form benchmarks and compare research. This paper shows that the CUAVE database can successfully be used to test speaker identifications systems, with performance comparable to existing systems implemented on other databases. Additionally, this research shows that the optimal configuration for decision-fusion of an audio-visual speaker identification system relies heavily on the video modality in all but clean speech conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

COVER SHEET Dean

The freely available nature of the CUAVE database allows it to provide a valuable platform to form benchmarks and compare research. This paper shows that the CUAVE database can successfully be used to test speaker identifications systems, with performance comparable to existing systems implemented on other databases. Additionally, this research shows that the optimal configuration for decision-...

متن کامل

Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multipl...

متن کامل

An audio-visual approach to simultaneous-speaker speech recognition

Audio-visual speech recognition is an area with great potential to help solve challenging problems in speech processing. Difficulties due to background noises are significantly reduced by the additional information provided by extra visual features. The presence of additional speech from other talkers during recording may be viewed as one of the most difficult sources of noise. This paper prese...

متن کامل

Region of Interest Extraction using Colour Based Methods on the CUAVE Database

Region of interest (ROI) extraction is an important step in deriving visual features for an audio-visual speech recognition system. Colour based segmentation offers the potential of computationally inexpensive algorithms for ROI selection. This paper presents a comparative study of two colour based techniques, one using hue and accumulated difference, the other chrominance. Results are presente...

متن کامل

Video Augmentation for Improving Audio Speech Recognition under Noise

For the recognition of speech, in particular spoken digits, captured in video with poor sound due to noise, we develop a novel audio-visual fusion technique that performs significantly better than utilising either audio or video signal alone. Specifically, we present an audio-visual intermediate fusion strategy to locate speaker dependant pronounced digits in continuous video recorded with soun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005